Reproducible Research 2.0

Dublin Data Science

Mick Cooney

2020-05-21

What is Reproducible Research?

An Example?

Anecdotes???

Meetups


Dublin Data Science


Insurely You’re Joking (Dublin|London)


Anyone who will have me

Talks


Workshops


Informal help

What To Do?

Reproducibility

Reproducible Research


  1. Source Control
  2. Workbooks


  1. Source Control
  2. Workbooks
  3. Makefiles
  4. Containers and Docker

Source Control

git


Track changes


Collaboration

Issue tracking


Branch management

Workbooks

What is research?

Outcome unknown…

Try lots of stuff…

Record of work

Jupyter vs Zeppelin vs Rmarkdown

NOT for production

Makefiles

Dependency Management

make and Makefiles

Directed Acyclic Graph (DAG)

sysadmin tasks

IMAGE_TAG=${PROJECT_USER}/${PROJECT_NAME}

CONTAINER_NAME=repro_research

render-html: ${PROJECT_NAME}.Rmd
    Rscript -e 'rmarkdown::render("${PROJECT_NAME}.Rmd")'

docker-build-image: Dockerfile
    docker build -t ${IMAGE_TAG} -f Dockerfile .

docker-run:
    docker run --rm -d \
      -p 8787:8787 \
      -v "${PWD}":"/home/${DOCKER_USER}/${PROJECT_NAME}":rw \
      -e USER=${DOCKER_USER} \
      -e PASSWORD=quickpass \
      --name ${CONTAINER_NAME} \
      ${IMAGE_TAG}

Containers and Docker

Rewind

Quitting from lines 272-288 (10_carinspricing_exploration.Rmd) 
Error in `[.tbl_df`(policyprop_dt, claim_count > 0) : 
  object 'claim_count' not found
Calls: <Anonymous> ... ggplot -> [ -> [.grouped_df -> NextMethod -> [.tbl_df

Execution halted

Docker

Lightweight containers


Library versioning

Issues

APIs (Yahoo! Finance)

Summary

Aspects


  1. Source Control
  2. Workbooks
  3. Makefiles
  4. Containers and Docker

Source Control


https://web.archive.org/web/20180924182907/http://hginit.com/


https://ohshitgit.com/


https://git-scm.com/book/en/v2

Workbooks


https://www.dataquest.io/blog/jupyter-notebook-tutorial/


https://zeppelin.apache.org/docs/0.5.5-incubating/


https://rmarkdown.rstudio.com/articles_intro.html

Makefiles


http://matt.might.net/articles/intro-to-make/


https://edoras.sdsu.edu/doc/make.html


https://www.gnu.org/software/make/manual/html_node/index.html

Docker


http://ropenscilabs.github.io/r-docker-tutorial/


https://docker-curriculum.com/


https://docs.docker.com/get-started/

Extras


Software Carpentry


https://gitlab.com/ecohealthalliance/drake-gitlab-docker-example


Questions?


Email:


GitHub:

https://github.com/kaybenleroll/data_workshops